Global WF DM1 using BTensor library #146
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a draft for a PR to add my BTensor library as a dependence to Vayesta,
in order to simplify certain functions, such as the global WF -> DM routines (I developed it for this purpose).
A short introduction to the library can be found here and a more practical example here.
Improved Contractions
The$T_2 \times L_2$ contractions currently look like this:
and simplifies to (ignore my slightly confused PyCharm syntax highlighting):
with
and some additional setup such as:
to initialize the
theta
andl2
tensor objects.Overlap matrices between cluster bases are automatically added by the library.
It also supports contraction over an intermediately constructed and pruned "intersection" basis, via the
intersect_tol
argument toeinsum
. This avoids having thesvd_tol
logic in the 1-DM routine. Note however, that with a finitesvd_tol
, the results will be slightly different to the original implementation for some fairly technical reasons (but as fas as I can tell, not worse when compared to the non-approximated DM, i.e. both are valid approximations).Dealing with Symmetry
Contributions from symmetry-derived fragments
fx
are currently included like this:This is difficult to understand, but means the following:
Loop over all fragments
fx2
deriving fromfx
(via theloop_symmetry_children
generator) and take the tuple of arrays(fx,.cluster.c_occ, fx.cluster.c_vir, doox, dvvx)
and perform the respective symmetry operations along axis[0, 0, 1, 1]
,respectively. The result of this (named
cx2_occ
, etc), will then be transformed accordingly and added to the final DM.For nested symmetries, say translations + rotations, the generator will automatically keep the intermediately transformed arrays (for example, an array that has been translated, but not yet rotated) stored, to avoid performing the same operations more than once.
In the btensor version, we deal with the symmetry a bit differently:
Here, we first transform the desired quantity into the AO basis and then loop over a set of symmetry-transformed
ao_sym
bases and use thereplace_basis
method to replace the basis of thedoox3
anddvvx3
tensors inplace (active transformation).We can then add the result to
doo
anddvv
(which are tensors in the occupied/virtual MO basis), since the library will perform the required back transformation automatically.I think this way of dealing with symmetry is definitely easier to understand, but I'm not yet fully sure about performance implications. At the moment,$N_\text{AO}^2$ matrices (except for translations, which are pure permutations), so the transformations will be $N^3$ . In principle, all these matrices should be sparse and block diagonal, since only the rotations between the angular momentum orbitals around their center lead to contributions that cannot be described by a permutation; so it should be possible to deal with the transformation more efficiently for large matrices.
ao
andao_sym
are related via fullFor completeness, the
ao_sym
generator looks like this, :Performance and Testing
So far, performance seems to be looking good, especially without using the SVD approximation.$\times$ $\text{H}_2\text{O}$ in cc-pVQZ, without
This is for 4
svd_tol
(time in seconds):and with
svd_tol = 1e-3
:The script for this can be found here
However, I think we need to perform a bit more testing around performance, especially for large systems, with and without SVD tolerance, with and without symmetry, and with MPI, and pay attention to both runtime and memory, before we commit to this way of handling basis transformations. Maybe we should also have a more detailed confirmation that this new SVD approximation is not worse than the current one, when comparing to the non-approximated output. We might also have to reassess the domain, where SVD makes sense, given that non-SVD is faster for the example here (we still have the possibility to skip cluster-pairs based on their overlap).